RIVEST.DON[UP,DOC] - www.SailDart.org

perm filename RIVEST.DON[UP,DOC] blob sn#453178 filedate 1979-06-30 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00020 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	A word of explanation...
C00005 00003	2/28/79 -- Sum + weighted product.
C00006 00004	2/28/79 -- Bin-packing
C00007 00005	3/01/79 -- 2-SAT
C00009 00006	3/02/79 -- Data compression
C00010 00007	3/03/79 -- Estimating size of tree
C00012 00008	3/04/79 -- Bit-wise reverse
C00013 00009	3/05/79 -- Binary subtree
C00014 00010	3/07/79 -- Worst case of GCD
C00015 00011	3/08/79 -- Row sort / column sort
C00016 00012	3/09/79 -- Shellsort
C00017 00013	3/10/79 -- Binary search with one false test
C00019 00014	3/11/79 -- Pipelined binary search
C00021 00015	3/12/79 -- Search with no upper bound
C00022 00016	3/13/79 -- Search with paid-for test values
C00024 00017	3/14/79 -- Uninitialized sparse array
C00025 00018	3/15/79 -- Interchanging unequal memory blocks
C00026 00019	3/18/79 -- Complex multiplication
C00027 00020	3/19/79 -- Semi-complete digraph
C00028 ENDMK
C⊗;
A word of explanation...

Around the beginning of March, 1979, Ron Rivest (RIVEST@MIT-ML) began
sending a series of problems to the MIT system notices, under the heading
"Theory Problem of the Day".  The problems varied in difficulty, though
none are outrageously hard.  I have collected them all here for easy access
by SAIL people; each page of this file contains a single problem along with
the date on which it was first posted.  The directory lines give some idea
of the subject of each problem.

You're welcome to send comments to me (DON), but please don't send me the
answers.  (I haven't finished working them all out myself yet!)  RIVEST@ML
is probably also amenable to correspondence.
2/28/79 -- Sum + weighted product.

Given:	A set of positive real numbers P0, P1, ..., Pn
	A set of real numbers	       S0, S1, ..., Sn
		such that 0 < Si < 1 for all i
	A real number T
	A real number C
Problem: To determine if there exists a subset {i1, ..., ik}
	 of the set {0, 1, ... n} such that
		C > (Pi1 + ... + Pik) + T * (Si1 * ... * Sik)
Question: How hard is this problem?  Is it NP-complete?

(problem posed by David Johnson @ Bell Labs)
2/28/79 -- Bin-packing

Let X = {x1, ..., xn} be the weights of n objects with 0 < xi < 1 .
We wish to pack the objects into a minimum number of bins, subject to:
	(i) At most two objects can be put in the same bin.
	(ii) xi and xj can share a bin if and only if xi + xj <= 1
(A) Design an O(n log(n)) algorithm that produces an optimal packing for
    any X with n objects.
(B) Prove that the packing produced by your algorithm indeed uses the
    minimum number of bins possible.

(From Stanford Qualifying Exam, Spring 1978)
3/01/79 -- 2-SAT

A formula of the propositional calculus is said to be in k-CNF
if it is the conjunction of clauses, where each clause is the disjunction
of at most k literals. (A literal is a variable or its complement.)

For example,  (X v ~Z) ∧ (~Y v ~X) ∧ (Z v Y)  is in 2-CNF.

Let k-SAT be the problem of determining whether a given formula in
k-CNF is satisfiable. (A formula is satisfiable iff there is an assignment
of TRUE/FALSE to each variable that makes the formula evaluate to TRUE.)

It is well known that 3-SAT is NP-complete.

Show that 2-SAT is easy: give an algorithm for 2-SAT which runs in time
polynomial in the length of the input formula.
3/02/79 -- Data compression

One may sometimes wish to store a large file (e.g. a video image) in a 
compressed format without losing too much of the original data.

Show how to compress a 23N-bit file down to 12N bits, such that when the 
original file is recreated from the reduced version at least 20N bits 
out of the 23N will be correct.  

Hint: Use the Golay (23,12,7) perfect error-correcting code in a clever way. 
This code contains 4096 codewords of length 23 bits which are at Hamming
distance 7 from each other.
3/03/79 -- Estimating size of tree

Suppose one would like to estimate the number  N  of leaves of a finite tree
which is too large to exhaustively search.  For example, estimating the number
of legal chess games can be formulated as such a problem.

Imagine taking a random path from the root of the tree to a leaf, where at each
step you decide randomly to go to one of the sons with probability  1/d , if
the node you are at has degree	d  (has  d  sons).  Let  P  be the product of
the degrees  d	of all the nodes you visited on the way.

Prove that  P  is a statistically valid estimate of  N , in the sense that
the expected value of  P  is exactly the number  N  of leaves in the tree.
3/04/79 -- Bit-wise reverse

Suppose your computer has N-bit words and an N-bit accumulator,
plus the usual LOAD/STORE, SHIFT (left or right up to N bits),
AND, OR, and NOT instructions.

Show how to BIT-WISE REVERSE a word using only O(log(N)) instructions.

(Suggested by Vaughan Pratt)
3/05/79 -- Binary subtree

The following lemma turns out to be very useful in proving results
about evaluating arithmetic expressions on a parallel machine.
Supply the proof (it's simple).

Lemma. Every binary tree with N leaves has an internal node X such
that the subtree rooted at X has between N/3 and 2N/3 leaves.
3/07/79 -- Worst case of GCD

The proof of the following lemma implies that the "worst case"
for the usual GCD algorithm is two consecutive Fibonacci numbers.
  Note 1:  GCD(M,N) = if N=0 then M else GCD(N, M mod N).
  Note 2:  FIB(0), FIB(1), FIB(2),... = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ... .

Lemma.	If GCD requires k recursive calls to itself to compute
	GCD(M,N) where M>N, then  M >= FIB(k+1).

Supply the simple inductive proof.
3/08/79 -- Row sort / column sort

Let A[i,j] be a matrix of real numbers.

Suppose we sort each row of A, so that A[i,j] < A[i,j+1] for all relevant i,j,
and then sort each column of A so that A[i,j] < A[i+1,j] similarly.

Prove that A remains sorted by rows even after it is sorted by columns
(i.e. A ends up sorted by both rows and columns).
3/09/79 -- Shellsort

The following variation on the preceding "rows & columns" problem
is the basis of the "Shellsort" sorting algorithm.

Let A be a vector of real numbers.  To "p-sort" A is to sort, for each
k in the range 0 to p-1, the subsequence of the form  { A[k],  A[k+p],
A[k+2*p], ...  }  so that  A[k] <= A[k+p] <= A[k+2*p] <= ... .

Prove that if A is p-sorted and then q-sorted, it remains p-sorted.
3/10/79 -- Binary search with one false test

Let n be a number known to both you and me, and suppose I choose
a number x between 1 and n, inclusive.	It is well known that
you can determine x by asking me at most  ceiling(lg(n))  questions of
the form "Is x less than or equal to y?" for various values of y.
(This is ordinary binary search.  Ceiling(z) is the least integer not
less than z.  Lg(n) is the logarithm to the base two of n.)

Finding this game to be quickly boring, we modify the rules so that I
am now allowed to lie in response to at most one of your questions.
How should you organize your strategy to minimize the number of questions?
Can you show that only	2lg(n)+O(1) questions suffice?	How about
lg(n) + sqrt(lg(n)) + O(1) ?  Or even  lg(n) + lg(lg(n)) + O(1) ?
3/11/79 -- Pipelined binary search

Here's another variation on the "discover my number" game.

As before, I have a number x in mind between 1 and n inclusive which you
are to find out by asking questions of the form "Is x <= y ?" for
various values of y.  In this variation I will always tell the truth,
but I won't answer your i-th question until you tell me what the
(i+1)-st question is.  (This models a certain kind of pipelined machine.)

Find a strategy for this game which uses no more than log(n)+O(1) questions
in the worst case, where the base of the logarithm is the golden ratio
G = 1.61803... . (Note that H**2 + H = 1.0, where H = 1/G = 0.61803... .)
3/12/79 -- Search with no upper bound

The previous binary search games have been over the finite range 1 to n.
Suppose instead that I am allowed to pick an arbitrary positive integer x
(i.e. there is no upper bound on what x may be).

Show that you can determine x using in the worst case at most
    lg(x) + lg(lg(x)) + lg(lg(lg(x))) + lg(lg(lg(lg(x)))) + ...
questions of the form "Is x<=y?" for various y.
(Lg is log to the base two, with lg(z)=0 if z<1.)
3/13/79 -- Search with paid-for test values

Suppose as before that I have chosen an arbitrary integer x which you
wish to determine by asking questions of the form "Is x<=y?" for various y,
but that we measure the worst-case cost of your strategy, not by the number
of questions you ask, but by the work you do to compute the y-values in your
questions.  Specifically, you are given the number 0 for free, but must pay
$1 to buy a number which is one greater than one you already possess, or to
buy a number which is double a number that you already possess.  The y-values
you use in your questions must of course belong to you. (sales receipt req'd.)

Show that you can determine x for  $ (log(x))**2  in the worst case.

(Suggested by Vaughan Pratt.)
3/14/79 -- Uninitialized sparse array

Sometimes it is desirable to use a large block of memory or disk space without
initializing it, especially if it is only going to be sparsely used (e.g. a
hash table or sparse matrix).  The trouble, of course, is distinguishing the
good data from the garbage when accessing the array.

Show that it is possible to use an array without initializing it, by making
use of a small auxiliary table of size proportional to the number of valid
array entries. The time to insert an item into the array or to retrieve an item
must be O(1) (i.e. constant time).
3/15/79 -- Interchanging unequal memory blocks

Show how to interchange two unequal size and possibly far-apart blocks
of storage using n swaps, where n is the size of the smallest block
containing those two blocks.  (Since you are going to have to
relocate the data in between the two blocks anyway, owing to their
unequal size, this is about as good as you can hope for.)
Think of the problem as reconfiguring five blocks ABCDE as ADCBE where
B and D are the blocks to be interchanged.

(Suggested by Vaughan Pratt)
3/18/79 -- Complex multiplication

The usual method of multiplying the complex numbers
	X = A + Bi	and	Y = C + Di
to produce the product
	Z = E + Fi	where	E = AC - BD and F = AD + BC
requires four real multiplications (AC, BD, AD, and BC), one addition and
one subtraction,  to produce E and F from A, B, C, and D.
	Show that Z can in fact be computed using only three real
multiplications.  (The number of additions/subtractions may increase.)
3/19/79 -- Semi-complete digraph

	You have a finite directed graph G such that either there is an edge
X-->Y or an edge Y-->X in G (but not both) for each pair of distinct vertices
X, Y.  Give simple inductive proofs for the following facts:
   (a) G must contain a vertex Z such that for every other vertex W
       either  Z --> W or there is a vertex U such that  Z-->U-->W.
   (b) The n vertices of G can be renumbered as X1, ..., Xn so that
       X1 --> X2 --> X3 --> ... --> Xn.  (i.e. G contains a Hamiltonian path.)